Search CORE

242 research outputs found

Relative Importance Sampling For Off-Policy Actor-Critic in Deep Reinforcement Learning

Author: Cheng Xueqi
Humayoo Mahammad
Publication venue
Publication date: 18/07/2019
Field of study

Off-policy learning is more unstable compared to on-policy learning in reinforcement learning (RL). One reason for the instability of off-policy learning is a discrepancy between the target (

\pi

) and behavior (b) policy distributions. The discrepancy between

\pi

and b distributions can be alleviated by employing a smooth variant of the importance sampling (IS), such as the relative importance sampling (RIS). RIS has parameter

\beta\in[0, 1]

which controls smoothness. To cope with instability, we present the first relative importance sampling-off-policy actor-critic (RIS-Off-PAC) model-free algorithms in RL. In our method, the network yields a target policy (the actor), a value function (the critic) assessing the current policy (

\pi

) using samples drawn from behavior policy. We use action value generated from the behavior policy in reward function to train our algorithm rather than from the target policy. We also use deep neural networks to train both actor and critic. We evaluated our algorithm on a number of Open AI Gym benchmark problems and demonstrate better or comparable performance to several state-of-the-art RL baselines

arXiv.org e-Print Archive

MatchZoo: A Learning, Practicing, and Developing System for Neural Text Matching

Author: Cheng Xueqi
Fan Yixing
Guo Jiafeng
Ji Xiang
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 24/07/2019
Field of study

Text matching is the core problem in many natural language processing (NLP) tasks, such as information retrieval, question answering, and conversation. Recently, deep leaning technology has been widely adopted for text matching, making neural text matching a new and active research domain. With a large number of neural matching models emerging rapidly, it becomes more and more difficult for researchers, especially those newcomers, to learn and understand these new models. Moreover, it is usually difficult to try these models due to the tedious data pre-processing, complicated parameter configuration, and massive optimization tricks, not to mention the unavailability of public codes sometimes. Finally, for researchers who want to develop new models, it is also not an easy task to implement a neural text matching model from scratch, and to compare with a bunch of existing models. In this paper, therefore, we present a novel system, namely MatchZoo, to facilitate the learning, practicing and designing of neural text matching models. The system consists of a powerful matching library and a user-friendly and interactive studio, which can help researchers: 1) to learn state-of-the-art neural text matching models systematically, 2) to train, test and apply these models with simple configurable steps; and 3) to develop their own models with rich APIs and assistance

arXiv.org e-Print Archive

Crossref

Parameter Estimation with the Ordered $\ell_{2}$ Regularization via an Alternating Direction Method of Multipliers

Author: Cheng Xueqi
Humayoo Mahammad
Publication venue: 'MDPI AG'
Publication date: 12/10/2019
Field of study

Regularization is a popular technique in machine learning for model estimation and avoiding overfitting. Prior studies have found that modern ordered regularization can be more effective in handling highly correlated, high-dimensional data than traditional regularization. The reason stems from the fact that the ordered regularization can reject irrelevant variables and yield an accurate estimation of the parameters. How to scale up the ordered regularization problems when facing the large-scale training data remains an unanswered question. This paper explores the problem of parameter estimation with the ordered

\ell_{2}

-regularization via Alternating Direction Method of Multipliers (ADMM), called ADMM-O

\ell_{2}

. The advantages of ADMM-O

\ell_{2}

include (i) scaling up the ordered

\ell_{2}

to a large-scale dataset, (ii) predicting parameters correctly by excluding irrelevant variables automatically, and (iii) having a fast convergence rate. Experiment results on both synthetic data and real data indicate that ADMM-O

\ell_{2}

can perform better than or comparable to several state-of-the-art baselines

arXiv.org e-Print Archive

Multidisciplinary Digital Publishing Institute

NSME: a framework for network worm modeling and simulation

Author: Cheng Xueqi
Lin Siming
Publication venue
Publication date: 01/08/2006
Field of study

Various worms have a devastating impact on Internet. Packet level network modeling and simulation has become an approach to find effective countermeasures against worm threat. However, current alternatives are not fit enough for this purpose. For instance, they mostly focus on the details of lower layers of the network so that the abstraction of application layer is very coarse. In our work, we propose a formal description of network and worm models, and define network virtualization levels to differentiate the expression capability of current alternatives. We then implement a framework, called NSME, based on NS2 for dedicated worm modeling and simulation with more details of application layer. We also analyze and compare the consequential overheads. The additional real-time characteristics and a worm simulation model are further discussed.5th IFIP International Conference on Network Control & Engineering for QoS, Security and MobilityRed de Universidades con Carreras en Informática (RedUNCI